Open Domain Short Text Conceptualization: A Generative + Descriptive Modeling Approach
نویسندگان
چکیده
Concepts embody the knowledge to facilitate our cognitive processes of learning. Mapping short texts to a large set of open domain concepts has gained many successful applications. In this paper, we unify the existing conceptualization methods from a Bayesian perspective, and discuss the three modeling approaches: descriptive, generative, and discriminative models. Motivated by the discussion of their advantages and shortcomings, we develop a generative + descriptive modeling approach. Our model considers term relatedness in the context, and will result in disambiguated conceptualization. We show the results of short text clustering using a news title data set and a Twitter message data set, and demonstrate the effectiveness of the developed approach compared with the state-of-the-art conceptualization and topic modeling approaches.
منابع مشابه
Short Text Conceptualization Using a Probabilistic Knowledgebase
Most text mining tasks, including clustering and topic detection, are based on statistical methods that treat text as bags of words. Semantics in the text is largely ignored in the mining process, and mining results often have low interpretability. One particular challenge faced by such approaches lies in short text understanding, as short texts lack enough content from which statistical conclu...
متن کاملThe Domain of the semantics of ‘promise’ in the Holy Quran
Semantics is a part of linguistic by which it can be analyzed the meaning of the words and sentences of a text and identified the part of speech with regard to semantics. This is a descriptive-analytic research and it deals with studying the meaning of ‘promise’ in the Holy Quran based on principles of semantics with a collocation approach by library methodology. Also, by virtue of ...
متن کاملEvaluating Generative Models for Text Generation
Generating human quality text is a challenging problem because of ambiguity of meaning and difficulty in modeling long term semantic connections. Recurrent Neural Networks (RNNs) have shown promising results in this problem domain, with the most common approach to its training being to maximize the log predictive likelihood of each true token in the training sequence given the previously observ...
متن کاملConceptualization of business excellence model with a grand theory approach
This study aims to conceptualize business excellence model and identify its variables and indicators. The philosophical foundations of the pragmatic, humanistic theory of symbolic interactionism has been, quietly and the strategy of grand theory deal with open, axial and selective coding; whose output is a new concept. Data collection is based on documentation studies on excellence models. The ...
متن کاملمدلسازی شبکههای چندترمیناله در سیستمهای قدرت با استفاده از روش برازش برداری بهمنظور تحلیل حالات گذرای الکترومغناطیسی فرکانس بالا
Modeling of frequency-dependent components of power system is often based on a terminal description by an admittance matrix in the frequency domain. One challenge in the extraction of state-space models from such data is to prevent possible error magnification when the model is to be applied in time-domain simulations. The error magnification is a consequence of inaccurate representation of sma...
متن کامل